-
Notifications
You must be signed in to change notification settings - Fork 1.8k
feat(opentelemetry source)!: Improve HTTP error signalling #22957
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
This will allow us to support json in a later stage, as the content type of the reply must match what the client used in the request. This also improves error messages a bit, as its now a 415 instead of a 500. A 500 internal server error is quite confusing when debugging. There is no error with the server, its the client sending the wrong request.
This tests the expected behaviour of the Open Telemetry source wrt error handling. Currently vector mostly sends 500 Internal Server errors, which isnt that helpful when debugging. It also means some retryable errors are not retried. The source implementation is done in the next step, so all tests fail.
Reject and recover indicates that the current filter didn't match but that others might[1]. This means reject is not the appropriate tool for handling e.g. deserialization errors. This commit migrates from rejection to returning the reply directly, and at the same time updates it to the new status codes to make sure it passes the new tests. This change also makes it possible to support `application/json` in the future. The `content-type` of the reply must match that of the request, and as we now have access to the request headers we can make sure they match. [1]: seanmonstar/warp#388 (comment)
} | ||
|
||
#[tokio::test] | ||
async fn delivery_error() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could use some help fixing this test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this failing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its failing to fail.
I want vector to fail delivery and end up in the BatchStatus::Errored
code path, but it doesn't seem to
That's true, we will a changelog explaining why this is a desired change. |
@pront gentle ping. What would be the best way to move this forward? |
We will need a changelog describing the breaking change (before & after). Otherwise this looks good. |
message: err_msg.message().into(), | ||
..Default::default() | ||
}); | ||
let mut events = events.unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should avoid unwrap()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree. Is there a nice syntax trick? The if let Err()
above should guarantee this unwrap is safe
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, you can use is_err()
and return early. Or:
let mut events = match events {
Ok(events) => ...,
Err(err) => { ... }
}
|
||
let mut resp = reply.into_response(); | ||
resp.headers_mut() | ||
.insert(http::header::CONTENT_TYPE, "text/plain".parse().unwrap()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should avoid unwrap
} | ||
|
||
#[tokio::test] | ||
async fn delivery_error() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this failing?
Some(receiver) => match receiver.await { | ||
BatchStatus::Delivered => Ok(protobuf(resp).into_response()), | ||
BatchStatus::Errored => reply_with_status( | ||
hyper::StatusCode::INTERNAL_SERVER_ERROR, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way to tell why it errored and whether it makes sense to return a 'retriable' status code (e.g. 502)
Ill pick this up again next week |
Summary
This PR actually does two things:
/v1/logs
is easier to debug than a 500.Another benefit from 1. is that it makes it easier to add JSON support in the future, we just need to pass the content-type header along. With rejects thats hard as the reject handler doesn't have access to headers. This will be done in a follow up.
This PR is half of #22875, making it easier to review.
I'm not sure if the HTTP status code behaviour is considered breaking, I assume it is so I marked it as such
Change Type
Is this a breaking change?
How did you test this PR?
Both tests and using it
Does this PR include user facing changes?
Notes
@vectordotdev/vector
to reach out to us regarding this PR.pre-push
hook, please see this template.cargo fmt --all
cargo clippy --workspace --all-targets -- -D warnings
cargo nextest run --workspace
(alternatively, you can runcargo test --all
)./scripts/check_changelog_fragments.sh
git merge origin master
andgit push
.Cargo.lock
), pleaserun
cargo vdev build licenses
to regenerate the license inventory and commit the changes (if any). More details here.References
#22940